Skip to content

fix(PLU-373): make notion database test row-order insensitive#714

Merged
awalker4 merged 3 commits into
mainfrom
austin/notion-stable-row-order
May 26, 2026
Merged

fix(PLU-373): make notion database test row-order insensitive#714
awalker4 merged 3 commits into
mainfrom
austin/notion-stable-row-order

Conversation

@awalker4
Copy link
Copy Markdown
Contributor

@awalker4 awalker4 commented May 21, 2026

Summary

Test plan

  • uncategorized_connectors_int_test Notion tests pass (real verification — needs the CI's NOTION_API_KEY)
  • Smoke-tested locally against the real fixture: reordered rows compare equal, a tampered row fails with a readable diff

Summary by cubic

Make the Notion database integration test row-order insensitive and correct a misleading helper comment, eliminating flaky failures when Notion returns rows in different orders. Addresses PLU-373.

  • Bug Fixes
    • Added unordered_table_html_equality_check to compare the header positionally and treat table rows as a multiset; prints a readable symmetric diff on mismatch.
    • Used in test_notion_source_database only; the page test remains unchanged.
    • Fixed the helper comment to state it scans all <tr> elements, not just the first table.
    • Bumped version to 1.6.6-dev and updated CHANGELOG.md; test-only change with no runtime impact.

Written for commit 87d92cc. Summary will update on new commits. Review in cubic

The notion_database integration test compared the downloaded HTML byte-
for-byte against a checked-in fixture, but Notion's database query
endpoint doesn't guarantee a stable row order — every prior fixture
refresh was just reshuffling the same rows. Compare rows as a multiset
of text content and print the symmetric difference on mismatch so real
regressions aren't silent.

Version is bumped to 1.6.2-dev to satisfy check-version without claiming
a release slot for a test-only change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Shadow auto-approve: would not auto-approve because issues were found.

Fix all with cubic | Re-trigger cubic

Comment thread test/integration/connectors/utils/validation/equality.py Outdated
Comment said the helper inspected the first <table>, but the code scans
every <tr> in the document via soup.find_all("tr").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Shadow auto-approve: would auto-approve. This change only modifies test infrastructure and documentation: it adds an order-insensitive HTML table equality check for a single Notion integration test, updates CHANGELOG, and bumps a dev version, with no impact on production code, business logic, or critical systems.

Re-trigger cubic

@potter-potter
Copy link
Copy Markdown
Contributor

@claude please review

@claude
Copy link
Copy Markdown

claude Bot commented May 26, 2026

Claude finished @potter-potter's task —— View job


Code Review

Reviewing PR changes to fix flaky Notion database tests...

Todo List:

  • Read and analyze all changed files
  • Review the new unordered_table_html_equality_check function
  • Assess test integration and approach
  • Check version bump and changelog
  • Provide comprehensive review feedback

Analysis of Changes

✅ New unordered_table_html_equality_check Function (equality.py:51-91)

Strengths:

  • Addresses the root issue: Notion's API doesn't guarantee stable row ordering, and this function correctly handles that by treating data rows as a multiset
  • Header preservation: Smart decision to compare the first <tr> positionally as a header while treating subsequent rows as unordered
  • Excellent error reporting: The symmetric difference output (lines 85-89) will make debugging much easier than previous black-box failures
  • Clean implementation: Uses Counter for multiset operations and sorted() for consistent comparison
  • Proper scope: Helper function split_rows() is appropriately scoped and reusable

✅ Test Integration (test_notion.py:65)

  • Correctly applied only to test_notion_source_database where row order is unstable
  • Appropriately left test_notion_source_page unchanged since pages aren't tables
  • Clean import and usage

✅ Version and Documentation

  • Version bump (__version__.py): Appropriate bump to 1.6.2-dev for test-only change
  • Changelog (CHANGELOG.md:1-5): Well-documented entry clearly stating this is test-only with no published behavior impact

Code Quality Assessment

Security: ✅ No security concerns - test infrastructure only
Performance: ✅ Efficient use of Counter and sorted() for multiset comparison
Maintainability: ✅ Clear function name, good docstring, readable implementation
Testing: ✅ Addresses flaky test issue with deterministic comparison

Minor Observations

  1. Error message formatting: The current format is clear and actionable:

    only in expected (x2): row content here  
    only in current  (x1): different row content
    
  2. Edge case handling: Function gracefully handles empty tables (if not rows: return "", [])

  3. Text extraction: Uses get_text(" ", strip=True) for consistent whitespace handling

Recommendation

✅ LGTM - This is a well-implemented fix that directly addresses the flaky test issue. The approach is sound, the implementation is clean, and the error reporting improvement is valuable for future debugging.

The fix correctly identifies that Notion's database query endpoint doesn't guarantee stable row order and implements an appropriate order-insensitive comparison while preserving header semantics.

Copy link
Copy Markdown
Contributor

@potter-potter potter-potter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

…row-order

# Conflicts:
#	CHANGELOG.md
#	unstructured_ingest/__version__.py
@awalker4 awalker4 merged commit 6689db0 into main May 26, 2026
37 of 38 checks passed
@awalker4 awalker4 deleted the austin/notion-stable-row-order branch May 26, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants